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An improved Markov-chain Monte Carlo sampler for the estimation 
of cosmological parameters from CMB data 
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ABSTRACT 

Markov-chain Monte Carlo sampling has become a standard technique for exploring the 
posterior distribution of cosmological parameters constrained by observations of CMB 
anisotropics. Given an infinite amount of time, any MCMC sampler will eventually converge 
such that its stationary distribution is the posterior of interest. In practice, however, naive 
samplers require a considerable amount of time to explore the posterior distribution fully. In 
the best case, this results only in wasted CPU time, but in the worse case can lead to un- 
derestimated confidence limits on the values of cosmological parameters. Even for the current 
CMB data set, the sampler employed in the widely-used COSMO-MC package does not sample 
very efficiently. This difficulty is yet more pronounced for data sets of the quality anticipated 
for the Planck mission. We thus propose a new MCMC sampler for analysing total intensity 
CMB observations, which can be easily incorporated into the COSMO-MC software, but has 
rapid convergence and produces reliable confidence limits. This is achieved by using dynamic 
widths for proposal distributions, dynamic covariance matrix sampling, and a dedicated pro- 
posal distribution for moving along well-known degeneracy directions. 

Key words: cosmic microwave background - methods: data analysis - methods: statistical. 



1 INTRODUCTION 

Markov-chain Monte Carlo (MCMC) sampling is a universal tech- 
nique used to explore high-dimensional density fields and has 
several advantages over more conventional grid-based approaches 
(see, for example, Gilks, Richardson & Speigelhalter 1995). In an 
astrophysical context, the MCMC approach was first applied to the 
determination of cosmological parameters from estimates of the 
total intensity cosmic microwave background (CMB) power spec- 
trum by Christensen et al. (2001) and Knox, Christensen & Skordis 
(2001). More recently Lewis & Bridle (2002) released the publicly- 
available COSMO-MC software package for this purpose. MCMC 
sampling methods have also recently been applied to the detection 
and characterisation of Sunyaev-Zel'dovich clusters in maps of pri- 
mordial CMB anisotropics (Hobson & McLachlan 2002). 

In an MCMC algorithm, a Markov chain is constructed whose 
equilibrium distribution is the density field (or posterior) of interest 
p(x). Thus, after propagating the Markov chain for a given burn-in 
period, one obtains (correlated) samples from p{x), provided the 
Markov chain has converged. A Markov chain is characterised by 
the fact that the state is drawn from a distribution (or tran- 
sition kernel) that depends only on the previous state of the chain 
jc„, and not on any earlier state. In its simplest form an MCMC 
sampler can be constructed using the Metropolis algorithm as fol- 
lows. At each step n in the chain, the next state x^+i is chosen by 
first sampling a candidate point x' from some symmetric proposal 
distribution q{x\x„) = q(x„\x), which may in general depend on the 



current state of the chain jc„. The candidate point jc' is then accepted 
with probability 



ifp{x')> p{x), 



p{x)/p{x') otherwise. 



(1) 



If the candidate point is accepted, the next state becomes x„+\ =x' , 
but if the candidate is rejected, the chain does not move, so x„^i = 
x„. In the limit of infinite number of chain steps, the density of the 
samples is proportional to the density p{x), provided the proposal 
function: (i) satisfies detailed balance, which requires that the prob- 
ability of proposing x' from x is the same as probability to propos- 
ing x fromx', and (ii) is such that every point in the parameter space 
has a finite probability of being proposed. 

Given an infinite number of steps any sampler satisfying the 
above conditions will eventually reach a converged state in which 
the samples are representative of target density p{x). In practice, 
however, the number of steps required to achieve convergence and 
fully explore the target density can vary dramatically depending 
on the form of the proposal distribution. A good proposal function 
produces candidate points that have a high probability of being ac- 
cepted and ensures good mobility of the chain around the target 
density. 

A common choice for the proposal function, used for exam- 
ple by Knox et al. (2001), is a multivariate Gaussian centred on 
the current chain position, with fixed widths o, in each parame- 
ter direction. There are, however, some problems associated with 
using this proposal function alone as the single MCMC 'engine'. 
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Firstly, the proposal widths must be chosen with considerable care 
and tailored to the application in hand. If the proposal widths are 
set too large, the acceptance rate will be very low, so that the chain 
remains stuck at a single point for a considerable number of steps, 
sometimes indefinitely in practical terms. On the other hand, if the 
proposal widths are too narrow, the acceptance rate will be high, 
but the chain has a limited mobility because it effectively executes 
a random walk and thus diffuses only very slowly around the target 
density. In particular, this can lead to severely underestimated con- 
fidence limits on parameters and spurious peaks in the sample dis- 
tribution associated with the initial chain position. A second prob- 
lem with the standard multivariate Gaussian proposal distribution is 
that, in high-dimensional problems, the degeneracy directions often 
take the form of small 'tunnels' in the target density, which are un- 
likely to be explored by chance. Even with appropriate proposal 
widths the chain must 'zig-zag' through such structures and there- 
fore these degeneracy directions are likely to be undersampled. 

In an attempt to speed up the sampling process, the COSMO- 
MC package (Lewis &. Bridle 2002) employs an number of de- 
vices that improve upon the simple multivariate Gaussian proposal 
function; these are explained in Section 2 below. This widely-used 
package has proved very successful in analysing CMB power spec- 
trum measurements from ground-based and balloon-bourne experi- 
ments. Nevertheless, as we show in Section 4, with the inclusion of 
WMAP data (Bennett et al. 2003), which is cosmic variance lim- 
ited out to ^ 350, the COSMO-MC sampler still suffers from a 
number of the disadvantages listed above. In particular, it produces 
marginalised probability distributions for some cosmological pa- 
rameters that yield underestimates of confidence limits by up to a 
factor of two. This undersampUng of the target density is yet more 
pronounced for cosmic variance limited CMB data out iol^ 2000, 
as is expected from the Planck mission. 

In this paper, we therefore present a new sampler module 
(called COG ) that is trivially substituted for the existing sampler in 
the COSMO-MC software. This sampler employs a number of strate- 
gies to avoid the difficulties encountered in the use of the standard 
multivariate Gaussian proposal function and the native COSMO-MC 
sampler. These strategies are explained in detail in Section 3, but 
the single most important advantage of the COG sampler is its use 
of the very well-known cosmological parameter degeneracies for 
CMB data (Efstathiou & Bond 1999), hence ensuring good mobil- 
ity around the target density. In Section 4, we test the COG sampler 
on the current CMB data set and a simulated data set of the quality 
expected from the Planck mission. In addition to producing reliable 
marginalised distributions, the COG sampler also requires far fewer 
evaluations of theoretical Cf spectra to explore the target density 
fully, and therefore provides speed up by a factor ~ 4 over the stan- 
dard COSMO-MC software when analysing the current CMB data 
set. In the analysis of the Planck-like data set, this speed increase 
rises to a factor of ~ 50. Our conclusions are presented in Section 5. 



2 THE COSMO-MC SAMPLER 

The COSMO-MC sampler uses CAMB (Lewis, Challinor & Lasenby 
2000) as its underlying theoretical CMB power spectrum genera- 
tor. The proposal density employed in this single-chain sampler is 
based on a multivariate Gaussian, but is tailored specifically to ex- 
ploit the difference between 'fast' and 'slow' parameters in CAMB, 
in order to increase the speed with which the target density may be 
sampled. 

At any given point x„ in the chain, it is much quicker to cal- 



culate the theoretical spectrum (and hence the value of the pos- 
terior) at the next candidate point x' if some of the parameter val- 
ues are the same for both points. In particular, since the pertur- 
bation evolution is assumed linear, once the transfer function for 
each wavenumber has been computed, it is fast to calculate the Ci 
spectrum corresponding to changes in any parameters governing 
the initial primoridal power spectra of scalar and tensor perturba- 
tions; these are thus termed 'fast' parameters. On the other hand, if 
one changes parameters governing the perturbation evolution, the 
resulting Cf spectrum will be much slower to compute since it re- 
quires a detailed recalculation of the linear physics; these are thus 
termed 'slow' parameters. 

In detail, the basic sampler works as follows. First it performs 
an initial sampling of the target density with a proposal density 
based on a multivariate Gaussian distribution determined by a set 
of user defined proposal widths o,-, one for each parameter. It begins 
by placing the parameters in random order in a 'queue'. If the first 
parameter in the queue is fast, all the fast parameters are updated 
to obtain the next candidate point x' . If the first parameter in the 
queue is slow, both this parameter and the next 0-2 parameters in 
the queue are updated (the number of additionally updated param- 
eters being drawn randomly from a uniform distribution). In either 
case, the corresponding candidate point is then either accepted or 
rejected in the usual manner. In subsequent iterations, the above 
process is repeated, each time starting from the next parameter in 
the queue. Once the entire queue has been looped over twice, the 
parameter order is again randomly reshuffled, and the process re- 
peated. For each parameter that is updated, the new parameter value 
is drawn from a Gaussian distribution centred at the present pa- 
rameter value and having width o^. = o,/ \/N, where N is the total 
number of parameters changed. The extra 1 /ViV factor ensures that 
proposals in which many parameters change still have a reasonable 
acceptance rate. 

After the initial sampling stage, one then has the option of per- 
forming a complete resampling of the the target distribution using 
information gained from the inital stage. In particular, the empir- 
ical covariance matrix of the initial set of samples is calculated, 
which is then diagonalised to obtain a set of principal directions 
which are taken as the new 'parameters'. These new parameters 
are then placed in random order in a queue as above, and at each 
proposal between 1 and 3 of them are updated (the precise num- 
ber of updated parameters again being chosen at random). Each 
time the queue has been looped through twice, it is again randomly 
reshuffled. We note that a covariance matrix for the standard 6- 
parameter inflationary flat ACDM model is provided in advance in 
the COSMO-MC package. Thus, in this case, one can dispense with 
the initial sampling stage altogether. 

In spite of the increased sampling speed achieved by the above 
devices, this basic sampler still suffers from the difficulties dis- 
cussed in Section 1 Most notably, the user-supplied proposal widths 
O/ must still be chosen with considerable care to avoid acceptance 
rates that are either too low (so the chain becomes stuck) or too 
high (so the chain mobility is limited, leading to underestimated 
confidence limits). Also, the use of a multivariate Gaussian pro- 
posal function can lead to undersampling along narrow degeneracy 
directions. 



3 THE COG SAMPLER 

The COG sampler is a replacement MCMC engine for 
the COSMO-MC software. It may be downloaded from 
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http://www.mrao.cain.ac.uk/~aiize/cog and included 
easily into the COSMO-MC package. It does, however, also require 
the Gnu Scientific Library for the integration of the degeneracy 
quantities discussed below. 

The sampler is composed of four separate MCMC engines: 
fast parameter changes (El), all parameter changes (E2), principal 
direction changes (E3), degeneracy direction changes (E4); each of 
these engines is explained below. At any given step in the chain, 
only one of engines is used to propose the next candidate point jc'. 
Which engine is used is decided randomly at each step according 
to a set of relative probabilities fixed by the user at compilation. To 
allow for the engines to respond dynamically to the stucture of the 
target density, at regular intervals (the number of chain steps defin- 
ing an 'interval' being fixed by the user at compilation: default 300) 
the engines El, E2 and E3 are overhauled. In particular, for El and 
E2, the proposal widths o, are adjusted to ensure reasonable ac- 
ceptance rates (see below). For E3, the empirical covariance matrix 
of recent samples is recalculated. We note that an overhaul is only 
performed after a rejection of a candidate point, thus ensuring that 
detailed balance is maintained. 

3.1 Fast parameter changes 

During a fast parameter change (engine El), only fast parameters 
in CAMB are updated. Each fast parameter is assigned a probability 
pi of being updated in any given El proposal. These are defined by 
the user at compilation. Note that the p, need not sum to unity. If 
N fast parameters are updated, each new parameter value is drawn 
from a Gaussian distribution centred at the present parameter value 
and having width = Qi/\fN. For the first sampling interval the 
a,- are those supplied by the user at compilation. In contrast to the 
COSMO-MC sampler, however, the proposal width o, for each pa- 
rameter is then updated at each overhaul, based on the acceptance 
rates achieved for that parameter in the previous sampling inter- 
val. If the average acceptance rate of a given parameter is less than 
a user selected target (3 (defined at compilation: default 0.4), it is 
likely that the proposal width for this parameter is too large and so 
the corresponding o,- is decreased by a fixed factor (defined at com- 
pilation: default 0.8). Similarly, if the acceptance rate is less than 
p the proposal width is probably too narrow and is thus increased 
by a fixed factor (defined at compilation: default 1.2). As we show 
in Section 4, the proposal width for each parameter eventually set- 
tles down to a stable value appropriate to the target density being 
sampled. 

3.2 All parameter changes 

The all-parameter-change engine (E2) operates in an identical man- 
ner to El, except that in this case the full set of fast and slow pa- 
rameters may be updated. 

3.3 Principal-direction changes 

The engine (E3) implements prinicipal-direction changes and 
makes use of the covariance information collected from early sam- 
ples. This engine is only switched-on once the chain has taken 
a given nimiber of steps (defined at compilation: default 200), at 
which point the empiricial covariance matrix of the parameters is 
calculated using preceeding samples; an upper limit on the number 
of preceeding samples used in the calculation is defined at com- 
pilation (default 2(K)0). The covariance matrix is then updated at 



each overhaul. The eigenvectors and eigenvalues of the covariance 
matrix are then determined, which yield the principal directions in 
which changes may be proposed and the corresponding Gaussian 
proposal widths. In an analogous manner to engines El and E2, 
each principal direction is assigned a probability p, of being up- 
dated (defined at compilation). Once again the pi need not sum to 
unity. 

This approach has two advantages over the covariance matrix 
method used in the COSMO-MC sampler. Firstly, the covariance ma- 
trix corresponds to the local degeneracy directions and is thus op- 
timised for the current position of the chain. Secondly, it alleviates 
the need to run the entire sampling process twice, thus making the 
computation requirements considerably smaller. 



3.4 Degeneracy-direction changes 

The degeneracy directions engine (E3) is the most important in- 
novation in the COG sampler, and ensures good mobility of the 
chain around the target density. This engine takes advantage of the 
fact that there are degeneracies in cosmological parameter space 
that are intrinsic to any CMB observation and cannot be broken 
even with cosmic- variance limited data. Some of these degenera- 
cies have been extensively analysed in Efstathiou and Bond (1999). 
In particular, we implement motion of the chain along the two most 
important degeneracies: namely the geometrical degeneracy and 
the peak position degeneracy. In addition, we also implement mo- 
tion of the chain along the degeneracy directions spanned by the 
parameters (zrei^s) and (cObjWs)- The probability pi of updating 
along each degeneracy direction is defined at compilation (default 
0.25). 



3.4.1 Geometrical degeneracy 

The geometrical degeneracy is a nearly exact degeneracy in cos- 
mological parameter space for CMB data; it therefore allows large 
movement of the chain in the space. According to Efstathiou and 
Bond (1999), two cosmological models are degenerate if they have 
the same physical matter densities (% and OOdm, the same primordial 
scalar and tensor fluctuation spectra and the same value of parame- 
ter 

^.^aTySial'^y), (2) 
where cOm = ©b + cOdm. y is given by the integral 



and 

5W = ^. (4) 

X 

For a flat universe the function S equals unity, and for closed uni- 
verses it simplifies to sine (V— <»,(;}'). The integral (3) may be evalu- 
ated numerically, where the lower limit is obtained using the fit- 
ting formula for the redshift of recombination given by Hu and 
Sugiyama (1995), namely 

zr = 1048[l+0.00124co^''™][l+gicC], (5) 
^1 = 0.0783(0^'' [1 + 39.5cog''^3]-i , 
g2 = 0.560[l-t-21.1o)^-^i]-i. 
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The quantity ^ is tlius a function of tiie parameter set 
(c0b,tt)m,C0A,(0k)- Thcsc parameters can be calculated from the 
equivalent COSMO-MC parametrisation ({n/,,C0dm,i^J:,'')- 

When a geometric degeneracy change is proposed, the param- 
eter 'J{_ is calculated for the current chain position x. A fixed-width 
Gaussian change in coa is proposed and the direction is then 
searched using a numerical minimiser imtil a model with a match- 
ing is found. Finally, the new set of parameters are converted 
back to the standard COSMO-mc parametrisation for this new can- 
didate point jc'. The proposal width in co^ is set at compilation and 
can be quite large (default 0.1), since the degeneracy is almost ex- 
act so candidate points should have a high acceptance rate. Indeed, 
the difference in CMB power spectrum between models x and x' is 
below the level of cosmic variance. However, due to inaccuracies 
in above approximations, as well as intrinsic nvunerical inaccura- 
cies in the camb code, the acceptance rate for these changes is 
less than vmity. Finally, we note that although the mmierical min- 
imisation step involves many numerical integrations, the time spent 
searching this space is negligible compared to the calculation of a 
single CMB power spectrum. 

3.4.2 Peak position degeneracy 

Another important degeneracy is the first peak position degener- 
acy. Although not as exact as geometrical degeneracy it still allows 
the chain to make large steps in the parameter space. Following 
Efstathiou & Bond (1999), the position of the first peak is approxi- 
mately given by 



Id ^ 0.7467t\/3ar ^'^tt-^ — ^ 



where 



_i/2 f"' da 

Is=ai ' / — , 

^0 ^{a + aeq){l+R) 



(6) 



(7) 



in which Ogq is the normalised scale factor at matter-radiation 
equality and R = (3pi/4py). To a good approximation 

24185 f V"" (8) 



R 



30496cOi,a. 



(9) 



Thus id is also a function of the parameter set {(£)h,(>im,(£)/^,({ii^). 
Since (Oj, and cOm are not required to be fixed there is consider- 
ably more freedom in choosing which parameters to change. In the 
present implementation ©m is always changed with co;, and coj. be- 
ing changed with some finite probability. The co^ direction is then 
numerically searched for a model with a matching first peak posi- 
tion. 

We note in passing that the third degeneracy mentioned in Ef- 
stathiou & Bond (1999), i.e. the height of the first peak degeneracy, 
is not very relevant to future analysis, since it is already broken by 
the existing CMB dataset. 

3.4.3 The Zre-Ag degeneracy 

Another well-known approximate physical degeneracy in the cos- 
mological parameter space for CMB data is that between Og and 
exp(— x) (see e.g. Lewis & Bridle 2002) This occurs because the 
CMB power spectrum on scales smaller than the horizon size at 
reionisation is damped by a factor exp(— 2t), and Og scales with 
the power in the primordial perturbation. The corresponding pa- 
rameters in the COSMO-MC software are zre and As. We note that 



Zre is a 'slow' parameter, whereas As is a 'fast' one. When a move 
in Zre is proposed, the sampler additionally proposes up to 30 (or a 
number selected at the compilation time) proposals in Ag. The pro- 
posal probability distribution function for As is a half Gaussian with 
the correct orientation (i.e. increasing Zve requires increasing As and 
therefore sampler always proposes change which has the right ori- 
entation with respect to the degeneracy direction). Since As is a fast 
parameter the computational cost is essentially just one CAMB call. 

3.4.4 The cOb-Ws degeneracy 

There is a weak degeneracy in the parameters (^-n^. We have im- 
plemented a degeneracy sampling system that is analogous to that 
described in the previous section with the slow parameter Wb and 
the fast parameter ns. 

3.5 Bum-in, convergence and annealing 

As mentioned in Section 1, any MCMC sampler requires a bum-in 
period for the chain to reach equilibrium and hence sample from the 
target distribution p(x). Unfortunately, there exists no formula for 
determining the length of the bum-in period, or for confirming that 
a chain has reached equilibrium. Indeed, the topic of convergence 
is still a matter of ongoing statistical research. Nevertheless, sev- 
eral convergence diagnostics for determining the length of bum-in 
have been proposed. These employ a variety of theoretical meth- 
ods and approximations that make use of the output samples from 
the Markov chain. A review of such diagnostics is given by Cowles 
& Carlin (1994). It is worth noting that running several parallel 
chains, rather than a single long chain, can aid the diagnosis of 
convergence, as discussed below in Section 4. 

It is also useful during bum-in to employ an annealing sched- 
ule in which the target density is raised to some power X, which 
varies gradually from zero to unity. Thus, one begins sampling from 
a modified posterior with A, = and slowly raises A- according to 
some (geometric) annealing schedule until X, = 1. This allows the 
chain to sample from remote regions of the posterior distribution, 
which in tum facilitates extensive chain mobility and ensures more 
reliable relaxation of the chain into the global optimum. This ap- 
proach also has the convenient by-product of yielding an estimate 
from the burn-in samples of the Bayesian evidence for the model 
under consideration (see, for example, Hobson & McLachlan 2002; 
Slosar et al. 2003). 



4 APPLICATION TO CMB DATA 

To test the quality of our improved sampler, we have performed 
parameter estimation in a 7-parameter inflationary ACDM model 
the using the two separate data sets outlined below. We have com- 
pared the results from the COG sampler with those obtained using 
the standard COSMO-mc software. For both samplers, the initial 
starting position of the chain and the initial proposal widths o,- 
for each parameter were identical, and are listed in Table 1. Also 
listed are the limits of the top-hat priors adopted for the parame- 
ters. Both the COSMO-MC and COG samplers were run until 6000 
post bum-in accepted samples had been collected for each chain. 
For both samplers, eight independent chains were mn on separate 
nodes of a Beowulf cluster. Thus, the total number of post burn- 
in accepted samples was 48000. This approach also allows one to 
compare samples obtained from different individual chains, which 
aids the determination of convergence. 
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parameter 


Xo 


top-hat prior 






0.023 


(0.005,0.1) 


0.01 




0.127 


(0.01,0.9) 


0.1 


h 


68 


(40, 100) 


15 


Zte 


13 


(6,20) 


6 


iik 


0.0 


(-0.3,0.3) 


0.06 


"s 


1.0 


(0.7,1.3) 


0.1 


As 


23 


(10,50) 


15.0 



Table 1. The initial chain position ;ico. the range of top-hat prior, and the 
initial proposal width a, used for each parameter. The a,- were chosen to 
be small to give the original COSMO-MC sampler some chance of exploring 
the cosmic variance limited model. 
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Figure 1. The average acceptance rate as a function of the number of sam- 
pler calls for the COSMO-MC sampler (thin line) and the COG sampler (thick 
line) in the analysis of data set 1 . The vertical lines indicate the end of the 
burn-in period for the COSMO-MC (thin dashed line) and COG (thick dashed 
line) samplers. 

4.1 Current CMB data 

The first data set (data set 1) consists of WMAP, VSA, ACBAR and 
CBI band-power measurements of the total intensity CMB power 
spectrum (Bennett et al. 2003; Grainge et al. 2003; Kuo et al. 2003; 
Pearson et al. 2003); for £ < 800 only points from the WMAP ex- 
periment are used. In the calculation of the WMAP likelihood we 
use an adapted version of the publicly-available code (Kogut et al. 
200; ffinshaw et al. 2003; Verde et al. 2003). 

4.1.1 Acceptance rates and sampler performance 

The average acceptance rates for the COSMO-MC and COG sam- 
plers are plotted in Fig. 1 as a function of the number of calls to 
the sampler. The original COSMO-MC sampler was supplied with 
the precomputed basic 6-parameter covariance matrix that comes 
supplied with the COSMO-MC software; this includes all the pa- 
rameters given in Table 1 except for Q.^. For this last parameter, 
COSMO-MC uses a Gaussian proposal distribution with the width 
indicated in the table. The provision of the covariance matrix in ad- 
vance gives the cosmo-mc sampler some advantage as compared 
with its own 'initial' sampling phase. Nevertheless, the COG sam- 
pler at first equals this acceptance rate, and very soon surpasses it, 
once it has learnt the appropriate length scales of the posterior. Af- 
ter many samples, the acceptance rate for the COSMo-mc sampler 



is found to be around 0.1. To obtain 6000 post burn-in accepted 
samples, 65000 calls to the sampler were made. By contrast, the 
average acceptance rate of the cog sampler is around 0.4 and only 
17000 calls to sampler were required. 

In Fig. 2 (top panel), we plot the acceptance rates for each of 
the four MCMC engines used in the COG sampler as a function of 
the nimiber of overhauls, which are performed after every 300 chain 
steps. As one might expect, the engines El and E2 have acceptance 
rates of around 0.4, since the proposal widths for the parameters 
are chosen to achieve this target. More interestingly, the E3 en- 
gine also has an acceptance rate of almost 0.4, once equilibrium has 
been achieved. We note, however, that the initial acceptance rate for 
E3 is zero, since there are not enough samples to compute the co- 
variance matrix. For E4 (degeneracy direction changes), one might 
have expected the acceptance rate to be high, since the change in the 
theoretical CMB power spectra corresponding to such a proposal is 
generally quite small. Nevertheless, inexactness in the degeneracies 
and small numerical inaccuracies in both in CAMB and the calcu- 
lation of the degeneracy parameters means that the acceptance rate 
is reduced somewhat below unity, although its equilibrium value 
of around 0.6 is higher than for the other engines. Moreover, suc- 
cessful proposals along the degeneracy directions take the chain a 
considerable distance from its original position and are thus very 
important. 

In Fig. 2 (middle and bottom panels), we plot the proposal 
widths o,- for the slow and fast parameters respectively, as a func- 
tion of overhaul number; each width is expressed as a fraction of its 
initial value as given in Table 1. We see that each proposal widths 
initially decrease and then start oscillating around their equilibrium 
values. This oscillation could be avoided by adding a damping term 
to the proposal width change function, but we believe that oscillat- 
ing gives the sampler opportunity to sample a larger range of dis- 
tances and potentially escape local minima while at the same time 
maintaining high acceptance rate. Additionally we note that the 
requirement to maintain an acceptance rate of 0.4 means that the 
proposal widths are considerably smaller than widths of inferred 
uncertainties in parameters. We have performed simulations on a 
multivariate Gaussian and results show that this is indeed an effec- 
tive sampling technique. 



4. 1.2 Inferred limits on cosmological parameters 

The marginalised probability distributions of the cosmological pa- 
rameters obtained from the COSMO-MC and COG samplers are 
shown in Fig. 3. We see that the distributions for the parameters (%, 
O^dm- "s, &e and As are consistent within the sampling uncertain- 
ties for the two samplers. We note, however, that the marginalised 
distributions for h, Q.^ and Slj^ differ somewhat. In particu- 
lar, we see that the distributions produced by the COSMO-MC sam- 
pler are significantly narrower than those obtained using the COG 
sampler. This provides a useful illustration of the poorer mobil- 
ity of the chain in the COSMO-MC sampler along the geometrical 
degeneracy. This leads to underestimation of the confidence limits 
on the associated parameters. Indeed, in some cases, the limits are 
underestimated by around a factor of two. Conversely, the explicit 
degeneracy direction engine in the COG sampler allows its chain 
to move freely around the parameter space. This ensures that the 
marginalised distributions reflect the proper structure of the poste- 
rior. It should be noted, however, that the underestimation of the 
confidence limits by the COSMO-MC sampler is usually not a seri- 
ous problem, since an informative prior on h breaks the geometrical 
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Figure 3. The marginalised distributions for the cosmological parameters as 

1 infen'ed from data set 1 using the original COSMO-MC sampler (thin Une) 

^° *° ^° and the COG sampler (thick line) with a chain length of 6500. 
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Figure 2. Top: the acceptance rates in the analysis of data set 1 for each 
of the four MCMC engines used in the COG sampler as a function of the 
number of overhauls, which are performed after every 300 chain steps - fast 
parameter (El: thin solid line), all parameters (E2: dashed line), principal 
directions (E3: dotted line), degeneracy directions (E4: dot-dashed line). 
The overall acceptance rate for the sampler is shown as the thick solid line. 
Middle: the proposal widths Cj for each slow parameter as a function of 
overhaul number. Each width is expressed as a fraction of its initial value 
given in Table 1 , for each slow parameter as a function of overhaul number. 
Bottom: as above, but for the fast parameters. 



degeneracy sufficiently that the resulting marginalised distributions 
are reasonably accurate. 



4.2 Future CMB data 

The second data set (data set 2) consists of simulated cosmic- 
variance limited measurements of the CMB power spectrum at each 
multipole out to £ = 2000. Such data is expected from the forth- 
coming Planck satellite mission. In detail, a target concordance 
model was chosen with parameter values cOf, = 0.023, COdn, = 0. 127, 
h = 0.68, ih = 1.0, Zre = 16, As = 25 and % = 0.0. The corre- 
sponding theoretical CMB power spectrum was calculated using 
CAMB with the flat-universe code switched off. In an earlier anal- 
ysis of data set 2, we noticed that discontinuities occurred in the 
sample distribution which were associated with the switch over in 
the CAMB code from flat universes to universes with arbitrary cur- 
vature. This inaccuracy in the CAMB code is only important when 
analysing data of such high precision, and has been previously 
identified (Lewis, private communication). To create data set 2, we 
assumed that the cosmic variance limit can be well-approximated 
by a Gaussian distribution centred on the true Ci value and having 
a dispersion given by 

aq=/ii:q. (10) 

In practice, with real data, one should take into account that the 
probability of any given Q value follows a x^-distribution; this is 
particularly important at low £ values. 
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Figure 4. As in Fig. 1, but for data set 2. 



4.2. 1 Acceptance rates and sampler performance 

The average acceptance rates for the COSMO-MC and COG samplers 
are plotted in Fig. 4 as a function of the number of calls to the sam- 
pler. Once again, the original COSMO-MC sampler was given the 
precomputed basic 6-parameter covariance matrix that comes sup- 
plied with the COSMO-MC software, which gives the COSMO-MC 
sampler some initial advantage. One sees, however, that the COG 
sampler soon overtakes and, in this case, there is a very large differ- 
ence between the equilibrium acceptance rates of the two samplers. 
After many samples, the acceptance rate for the COSMO-MC sam- 
pler is found to be around 0.01. Even after 70000 sampler calls for 
each chain, only around 600 post burn-in accepted samples were 
obtained for the best-performing chain. Conversely, the COG sam- 
pler achieved an equilibrium acceptance rate of 0.22, and required 
only 28000 sampler calls to obtain 6000 post bum-in accepted sam- 
ples for each chain. We also note that, as a result of the simulated 
annealing used in the COG sampler, it required only 3100 sampler 
calls were required to bum-in, as compared with 10000 sampler 
calls for COSMO-MC (in the latter case, the end of burn-in was de- 
termined interactively by examining the posterior values associated 
with the chain samples). 

In Fig. 5 (top panel), we plot the acceptance rates for each of 
the four MCMC engines used in the COG sampler as a function of 
the number of overhauls. Once again, by constmction, the engines 
El and E2 have acceptance rates of around 0.4 once equilibrium has 
been achieved. For E3 (principal direction changes), we again see 
that the initial acceptance rate is zero, since there are not enough 
samples to compute the covariance matrix, but in this case the ac- 
ceptance rate remains low, reaching an equilibrium value of only 
0.05. We believe this behaviour is associated with the fact that, for 
data set 2, the corresponding multivariate Gaussian proposal dis- 
tribution is a poorer approximation to the true posterior than for 
data set 1. For E4 (degeneracy direction changes), we see that the 
inexactness in the degeneracies and small numerical inaccuracies 
in both in CAMB and the calculation of the degeneracy parame- 
ters have a more profound effect for the analysis of data set 2, re- 
sulting in a lower equilibrium acceptance rate of around 0.25. As 
for data set 1, however, successful proposals along the degeneracy 
directions take the chain a considerable distance from its original 
position and are thus very important. 
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Figure 5. As in Fig. 2, but for data set 2. 



4.2.2 Inferred limits on cosmological parameters 

The marginalised probability distributions of the cosmological pa- 
rameters obtained from the COG sampler are shown in Fig. 3. We 
do not plot the corresponding distributions for the COSMO-MC sam- 
pler, because it was only able to produce a total of 1500 accepted 
post bum-in samples from 70000 sampler calls for each of the eight 
chains. This results in the corresponding marginalised distributions 
being dominated by sampling error. 

We note that the distributions for each parameter produced 
by the COG sampler contain the corresponding true value at high 
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Figure 6. As in Fig. 3, but for data set 2 and using the COG sampler only. 

probability. For cOb, coj^, n^, Zre and As the distributions are rea- 
sonably smooth. However, the marginalised distributions for h, Q.^, 
Sim and contain some additional features. In particular, we note 
that each of these distributions contains a 'drop-out' ; these occur 
at /i 0.6 and h ^ 0.8, Q.^ ^ -0.05, ^ 0.4 and ^ 0.6. 
These features are, in fact, the projections of a single feature in 
the multidimensional parameter space defining the geometrical de- 
generacy direction. We have separately examined the marginalised 
distributions produced for each of the eight chains, and found that 
all of them contain these features. It is therefore most likely that 
this feature is an artifact in the CAMB code. More importantly, set- 
ting aside these drop-outs, we see that the geometrical degeneracy 
is well-sampled, leading to wide tails on the marginalised distri- 
butions for £i.ra and fl^- This confirms the well-known result 
that, even with CMB data of Planck quality, the spatial curvature 
of the universe cannot be well-constrained without the inclusion of 
informative priors. 



5 DISCUSSION AND CONCLUSIONS 

We have presented a fast Markov-chain Monte Carlo sampler tai- 
lored to the problem of estimating cosmological parameters from 
measurements of the CMB total intensity power spectrum. This 
sampler employs a number of strategies to avoid the difficulties 
encountered in the use of the standard multivariate Gaussian pro- 
posal function used, for example, in the COSMO-mc software pack- 
age (Lewis & Bridle 2002). In particular, it achieves rapid conver- 
gence and produces reliable confidence limits by using dynamic 
widths for proposal distributions, dynamic covariance matrix sam- 
pling, and a dedicated proposal distribution for moving along weU- 
known degeneracy directions. The new sampler module (called 



COG ) is trivially substituted for the existing sampler in the COSMO- 
MC software. In Section 4, we test the COG sampler on the cur- 
rent CMB data set and a simulated data set of the quality expected 
from the Planck mission. In each case, the sampler produces re- 
liable marginalised distributions with considerably fewer sampler 
calls than the native cosmo-mc sampler. 

In future work, we intend to enhance the efficiency of the COG 
sampler still further by allowing communication between chains. 
At present running several independent chains on different proces- 
sors of a Beowulf cluster provides a useful means of diagnosing 
convergence, but does not take advantage of the possibility of shar- 
ing the information obtained by the chains on the shape of the tar- 
get density. In particular, we are investigating cross-chain proposals 
based on genetic algorithms (see e.g. Davis 1991). 
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